智能论文笔记

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

Zhenyu Xie , Zaiyu Huang , Fuwei Zhao , Haoye Dong , Michael Kampffmeyer , Xin Dong , Feida Zhu , Xiaodan Liang

分类：计算机视觉

2022-07-27

基于图像的虚拟试验是以人为中心的现实潜力，是以人为中心的图像生成的最有希望的应用之一。在这项工作中，我们迈出了一步，探索多功能的虚拟尝试解决方案，我们认为这应该具有三个主要属性，即，它们应支持无监督的培训，任意服装类别和可控的服装编辑。为此，我们提出了一个特征性的端到端网络，即用空间自适应的斑点适应性GAN ++（Pasta-gan ++），以实现用于高分辨率不合规的虚拟试验的多功能系统。具体而言，我们的意大利面++由一个创新的贴布贴片的拆卸模块组成，可以将完整的服装切换为归一化贴剂，该贴片能够保留服装样式信息，同时消除服装空间信息，从而减轻在未受监督训练期间过度适应的问题。此外，面食++引入了基于贴片的服装表示和一个贴片引导的解析合成块，使其可以处理任意服装类别并支持本地服装编辑。最后，为了获得具有逼真的纹理细节的尝试结果，面食gan ++结合了一种新型的空间自适应残留模块，以将粗翘曲的服装功能注入发电机。对我们新收集的未配对的虚拟试验（UPT）数据集进行了广泛的实验，证明了面食gan ++比现有SOTA的优越性及其可控服装编辑的能力。

translated by 谷歌翻译

Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement

Fuwei Zhang , Zhao Zhang , Xiang Ao , Dehong Gao , Fuzhen Zhuang , Yi Wei , Qing He

分类：机器学习

2021-12-27

交叉语言信息检索（CLIR）旨在将以与用户查询不同的语言编写的文档进行排序。不同语言之间的内在差距是CLIR的基本挑战。在本文中，由于多种语言的实体的足够信息，我们将多语言知识图（kg）引入CLIR任务。它被视为“银弹”，同时在查询和文档之间进行显式对齐，并扩大查询的表示。我们提出了一个名为CLIR的模型，为我们的任务提供了分层知识增强（加息）。所提出的模型用多语言BERT编码查询，文档和kg中的文本信息，并在具有分层信息融合机制中将kg信息包含在查询文件匹配过程中。特别是，徒步旅行首先将kg中的实体及其社区集成到具有知识级融合的查询表示中，然后将来自源语言的知识结合起来进一步减轻语言级融合的语言间隙。最后，实验结果表明，徒步旅行达到了最先进的竞争对手的大量改进。

translated by 谷歌翻译

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN

Zhenyu Xie , Zaiyu Huang , Fuwei Zhao , Haoye Dong , Michael Kampffmeyer , Xiaodan Liang

分类：计算机视觉

2021-11-20

基于图像的虚拟试图是由于其巨大的真实潜力，以人为本的图像生成最有希望的应用之一。然而，由于大多数预先接近店内服装到目标人物，他们需要对成对的训练数据集进行费力和限制性的结构，严重限制了它们的可扩展性。虽然最近的一些作品试图直接从一个人转移服装，但减轻了收集配对数据集的需要，它们的表现受缺乏配对（监督）信息影响。特别地，衣服的解开样式和空间信息成为一个挑战，通过需要辅助数据或广泛的在线优化程序来解决任何方法，从而仍抑制其可扩展性。实现A \ EMPH {可扩展}虚拟试样系统，可以以无监督的方式在源和目标人物之间传输任意服装，因此我们提出了一种纹理保留的端到端网络，该包装空间 - 适应甘（意大利面），促进了现实世界的未配对虚拟试验。具体而言，要解开每位服装的风格和空间信息，意大利面甘包括一个创新的补丁路由解剖模块，用于成功挡住衣服纹理和形状特性。由源人关键点引导，修补程序路由的解剖学模块首先将衣服脱发到标准化的贴片中，从而消除了衣服的固有空间信息，然后将归一化贴片重建到符合目标人员姿势的翘曲衣服。鉴于翘曲的衣服，Pasta-GaN进一步推出了一种新型空间适应性的残余块，指导发电机合成更现实的服装细节。

translated by 谷歌翻译

Adding Context to Source Code Representations for Deep Learning

Fuwei Tian , Christoph Treude

分类：机器学习

2022-07-30

深度学习模型已成功应用于各种软件工程任务，例如代码分类，摘要以及错误和漏洞检测。为了将深度学习应用于这些任务，需要以适合输入深度学习模型的格式表示源代码。表示源代码的大多数方法，例如令牌，抽象语法树（ASTS），数据流程图（DFGS）和控制流程图（CFGS）仅关注代码本身，并且不考虑可能有用的其他上下文用于深度学习模型。在本文中，我们认为深度学习模型有利于访问有关正在分析的代码的其他上下文信息。我们提供了初步证据，表明从呼叫层次结构中编码上下文以及代码本身的信息可以改善针对两个软件工程任务的最先进的深度学习模型的性能。我们概述了研究议程，以添加进一步的上下文信息为源代码表示以进行深度学习。

translated by 谷歌翻译

Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

Liqun Lin , Yang Zheng , Weiling Chen , Chengdong Lan , Tiesong Zhao

分类：计算机视觉

2023-01-03

Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Mingzhe Li , Xiuying Chen , Weiheng Liao , Yang Song , Tao Zhang , Dongyan Zhao , Rui Yan

分类：自然语言处理

2023-01-03

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

translated by 谷歌翻译

Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order

Xiuying Chen , Mingzhe Li , Shen Gao , Zhangming Chan , Dongyan Zhao , Xin Gao , Xiangliang Zhang , Rui Yan

分类：自然语言处理

2023-01-02

Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.

translated by 谷歌翻译

Fusing Models for Prognostics and Health Management of Lithium-Ion Batteries Based on Physics-Informed Neural Networks

Pengfei Wen , Zhi-Sheng Ye , Yong Li , Shaowei Chen , Shuai Zhao

分类：人工智能 | 机器学习

2023-01-02

For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.

translated by 谷歌翻译